Search CORE

77 research outputs found

Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis

Author: Cheshmi Kazem
Dehnavi Maryam Mehri
Kamil Shoaib
Strout Michelle Mills
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/05/2017
Field of study

Sympiler is a domain-specific code generator that optimizes sparse matrix computations by decoupling the symbolic analysis phase from the numerical manipulation stage in sparse codes. The computation patterns in sparse numerical methods are guided by the input sparsity structure and the sparse algorithm itself. In many real-world simulations, the sparsity pattern changes little or not at all. Sympiler takes advantage of these properties to symbolically analyze sparse codes at compile-time and to apply inspector-guided transformations that enable applying low-level transformations to sparse codes. As a result, the Sympiler-generated code outperforms highly-optimized matrix factorization codes from commonly-used specialized libraries, obtaining average speedups over Eigen and CHOLMOD of 3.8X and 1.5X respectively.Comment: 12 page

arXiv.org e-Print Archive

Crossref

Identifying and Scheduling Loop Chains Using Directives

Author: Bertolacci Ian J.
Guzik Stephen
Olschanowsky Catherine
Riley Jordan
Strout Michelle Mills
Publication venue: 'IUScholarWorks'
Publication date: 01/01/2016
Field of study

Exposing opportunities for parallelization while explicitly managing data locality is the primary challenge to porting and optimizing existing computational science simulation codes to improve performance and accuracy. OpenMP provides many mechanisms for expressing parallelism, but it primarily remains the programmer’s responsibility to group computations to improve data locality. The loopchain abstraction, where data access patterns are included with the specification of parallel loops, provides compilers with sufficient information to automate the parallelism versus data locality tradeoff. In this paper, we present a loop chain pragma and an extension to the omp for to enable the specification of loop chains and high-level specifications of schedules on loop chains. We show example usage of the extensions, describe their implementation, and show preliminary performance results for some simple examples

Crossref

Boise State University - ScholarWorks

Reverse-mode algorithmic differentiation of an OpenMP-parallel compressible flow solver

Author: Blazek J
Förster M
Giles MB
Jan Hückelheim
Jens-Dominik Müller
Michelle Mills Strout
Naumann U
Paul Hovland
Spalart P
Publication venue: 'SAGE Publications'
Publication date: 03/05/2017
Field of study

Crossref

Queen Mary Research Online

iii To my husband Joe iv TABLE OF CONTENTS

Author: Michelle Mills Strout
Michelle Mills Strout
Publication venue
Publication date
Field of study

it is acceptable in quality and form for publication on microfilm: Co-Chai

CiteSeerX

ABSTRACT Representation-Independent Program Analysis

Author: Michelle Mills Strout
Publication venue
Publication date
Field of study

Program analysis has many applications in software engineering and high-performance computation, such as program understanding, debugging, testing, reverse engineering, and optimization. A ubiquitous compiler infrastructure does not exist; therefore, program analysis is essentially reimplemented for each compiler infrastructure. The goal of the OpenAnalysis toolkit is to separate analysis from the intermediate representation (IR) in a way that allows the orthogonal development of compiler infrastructures and program analysis. Separation of analysis from specific IRs will allow faster development of compiler infrastructures, the ability to share and compare analysis implementations, and in general quicker breakthroughs and evolution in the area of program analysis. This paper presents how we are separating analysis implementations from IRs with analysis-specific, IR-independent interfaces. Analysis-specific IR interfaces for alias/pointer analysis algorithms and reaching constants illustrate that an IR interface designed for language dependence is capable of providing enough information to support the implementation of a broad range of analysis algorithms and also represent constructs within many imperative programming languages. 1

CiteSeerX

Algorithms + Data Structures + Transformations = Portable Program Performance

Author: Michelle Mills Strout
Publication venue
Publication date
Field of study

Many scienti c applications require sparse matrix computations. For example, Finite Element modeling and N-body simulations. It is di cult to write these codes in a portable way which also achieves high performance because of the sparsity of the matrices and because current architectures have deep memory hierarchies and multiple levels of parallelism. Therefore the implementation of such computations become obfuscated because of the hand tuning necessary to get performance on a speci c architecture. Three performance aspects which must be dealt with are the matrix sparsity, data locality, and parallelism. Typically less than 1 % of the entries in the matrix are non-zero [PS98], therefore it is necessary to use sparse data structures which only store the non-zeros. There are many di erent sparse data formats which save space and computation time for matrices with certain characteristics. Deep memory hierarchies and large relative memory latencies suggest the need for data locality optimizations which take advantage of data reuse. Parallelism allows for bigger problems to be solved. Current work in this problem domain has either looked at separating the algorithm speci cation for the sparse data structure speci cation, or looked at locality and parallelism transformations for sparse computations which use speci c sparse data structures. We would like to propose having locality and parallelism transformations be able to deal with any possible combination of sparse matrix computation and sparse data structure.

CiteSeerX